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Abstract 

This paper presents a study on data dissemination in unstructured Peer-to-Peer (P2P) 
network overlays. The absence of a structure in unstructured overlays eases the network 
management, at the cost of non-optimal mechanisms to spread messages in the network. 
Thus, dissemination schemes must be employed that allow covering a large portion of 
the network with a high probability (e.g. gossip based approaches). We identify principal 
metrics, provide a theoretical model and perform the assessment evaluation using a 
high performance simulator that is based on a parallel and distributed architecture. A 
main point of this study is that our simulation model considers implementation technical 
details, such as the use of caching and Time To Live (TTL) in message dissemination, that 
are usually neglected in simulations, due to the additional overhead they cause. Outcomes 
confirm that these technical details have an important influence on the performance 
of dissemination schemes and that the studied schemes are quite effective to spread 
information in P2P overlay networks, whatever their topology. Moreover, the practical 
usage of such dissemination mechanisms requires a fine tuning of many parameters, the 
choice between different network topologies and the assessment of behaviors such as free 
riding. All this can be done only using efficient simulation tools to support both the 
network design phase and, in some cases, at runtime. 

Keywords: Data dissemination, Simulation, Complex Networks, Performance 
Evaluation 


1. Introduction 

Unstructured Peer-to-Peer (P2P) systems have been recognized as a good practice 
to build effective distributed applications. This is particularly evident when peers com¬ 
posing the network are dynamic, with frequent arrivals and departures. In fact, in this 
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case, the use of agile attachment strategies to create an overlay network (i.e. the network 
composed of links representing an interaction/connection between nodes), plus the use 
of a simple dissemination protocol to let nodes interact, offer an easy way to manage 
interaction substrate, on top of which it is possible executing distributed applications. 
Nodes create links based on an attachment process that does not depend on the “type” of 
involved nodes. Thus, for instance, if we are dealing with a content management system, 
links are not created based on the contents owned by peers; rather, links are established 
based on other criteria, e.g. arbitrarily. 

As concerns the spread of information, an interesting solution is based on gossip. 
This epidemic dissemination strategy uses randomized communication that distributes 
contents without a specific, content based, routing scheme. Gossip has been recognized as 
a robust and scalable communication paradigm to be employed in large-scale distributed 
environments ms. In fact, although it has communication costs usually higher 
than other, optimized solutions, e.g. tree-based protocols, a gossip-based dissemination 
scheme is intrinsically fault tolerant. 

There is a vast literature on gossip. Related studies are mainly theoretical, since their 
aim is to prove that large-scale networks can reliably and effectively employ these strate¬ 
gies to disseminate information 0, 0000 ■ For instance, in 0 it is shown that, depend¬ 
ing on the system model, certain gossip-based protocols can achieve a message complexity 
around 0(nlog 3 n), or even O(n), with high probability. It is worth mentioning that in 
the proposed models the behavior of nodes is usually simplified, and several practical 
issues are not considered, that instead should be took into consideration when building 
a distributed system. Some other works exploit simulation to evaluate epidemic strate¬ 
gies 0,1 00- Also in these works, nodes have a very simple behavior. The rationale 
behind this choice is twofold, usually. Firstly, often a simple behavior of nodes/agents 
allows verifying quite easily if an interesting emergent behavior occurs at the whole sys¬ 
tem/network level. Secondly, these simplifications allow having a lightweight simulation 
model that enables the simulator to scale up to large networks. However, also in this 
case, while a general main result is obtained, there is a lack of technical details, which 
are instead important during the real deployment of these strategies. 

Today, the use of parallel and distributed simulation and the advent of multi/many- 
core processors make possible adding more details on the behavior of simulated entities. 
Adding such complexity in simulation corresponds to give more emphasis on the impact 
of some important algorithmic details and expedients that can affect the dissemination 
performance in a P2P overlay network. In this work, we assess the performance of 
different dissemination protocols on different P2P overlays, and study the impact of 
caching and of the Time To Live (TTL) to distribute messages. In our assessment, we 
employ parallel and distributed simulation. The metrics employed during the assessment 
are the coverage of the network and the delay for disseminating messages. Not only, 
a theoretical model is provided that, given a gossip protocol and the topology of the 
underlying network overlay, allows estimating the threshold values corresponding to a 
phase transition between the ability of a given gossip protocol to spread a message to 
a significant set of nodes in the net, and a local dissemination that reaches a limited 
neighborhood of a node, only. 

The contributions of this work are the following. 

• We study a degree dependent dissemination algorithm, that relays messages to 
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nodes based on their degree. We employ different degree dependent functions in 
our simulations. 

• We provide a theoretical model that based on the dissemination protocol and the 
degree distribution of nodes composing the network, is able to determine the thresh¬ 
old values for the parameters of the dissemination algorithm. Such a threshold 
identifies a phase transition: below the threshold a disseminated message reaches 
a small, local fraction of network nodes, while above the threshold the message 
reaches a giant component of the network, i.e. a set of nodes of the order of the 
network size. 


• We study the impact of cache and TTL on data dissemination. 


• We perform a parallel and distributed simulation of dissemination algorithms in 
large scale networks; different network topologies are employed (i.e. random graphs, 
scale-free networks, Watts-Strogatz small-world networks, k-regular). To the best 
of our knowledge, this is the first contribution that shows results of large scale 
simulations over different topologies, where the dissemination is so highly intensive 
and nodes behavior considers cache and TTL management. Moreover, contrary to 
other typical simulation studies, where a single node acts as the source generating 
messages, in our simulations all nodes generate messages, concurrently. From a 
simulation point of view, this a more complex problem that mimics network in¬ 
tensive networking applications, e.g. P2P online gaming and distributed virtual 
environments , P2P file sharing or wireless sensor networks. 


A preliminary evaluation of the impact of free riding HU on data dissemination 
with different gossip protocols and network topologies is reported. 


• To assess the performance of the considered dissemination protocols, we use a met¬ 
ric termed “overhead ratio”, that measures the total number of delivered messages 
(for a given protocol) over the minimum number of messages needed to obtain a 
complete coverage (the lower bound), given the considered graph. The rationale 
behind this metric is to quantify the overhead, in terms of sent messages, for a com¬ 
munication protocol. It allows to compare the behavior of different dissemination 
strategies over different network topologies. 


• Given a set of nodes in a P2P overlay, and the need to create a given overlay, an issue 
is how to set the network in order to guarantee certain communication properties. 
Our work permits to understand, during the design phase of a P2P overlay, how to 
set network parameters so as to obtain a certain overhead, that would guarantee a 
certain network coverage and delay. In most cases, this can be done only exploring 
the space of parameters of the available dissemination protocols. In practice, this 
requires the execution of a high number of simulations runs. This confirms, the 
need of scalable and efficient simulation tools. 


The reminder of this paper is structured as follows. Section [5] presents some related 
works available in the literature. In Section [3J we discuss some background needed to 
understand the gossip algorithms and the performance assessment. Section [4] presents 
the considered dissemination algorithms. Section [5] presents the theoretical model and 
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employs it to study the ability of the considered dissemination strategies to spread mes¬ 
sages over an overlay network, given its degree distribution. In Section [6j the simulation 
testbed is described. Section [7] reports on simulation experiments we carried out. The 
main results from the performance evaluation are discussed more fully in Section [8] Fi¬ 
nally, concluding remarks are reported in Section 0 


2. Related Work 


In this section, we review some works concerned with data dissemination in unstruc¬ 
tured P2P networks. Since the considered schemes are based on gossip-style epidemic 
protocols, we focus on gossip-related approaches. Moreover, a main rationale for this 
choice is that gossiping poses some challenges when we try to guarantee a high (possibly 
full) network coverage and low delay, in scalable and highly intensive scenarios. Gossip 
is a simple, yet effective strategy to disseminate information. Its main feature is the use 
of randomization to propagate data. It has been proved that in certain contexts this 
provides better reliability and scalability than deterministic approaches 
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Gossip-based communication can use either push, pull or push-pull schemes. Accord¬ 
ing to a push based dissemination, it is the sender that decides which nodes will receive 
a message it is relaying through the network. Pull-based approaches let receivers trig¬ 
ger a communication with another node that will send some data. Finally, in push-pull 
protocols both nodes gossiping with each other share their owned data. 

There is a vast literature on the use of dissemination and gossip-based approaches in 
distributed systems. Many works propose its utilization in several app lication domains. 


Examples are information spreading in mobile ad-hoc environments 
0 , multiplayer online games and distributed virtual environments 


application 

30,0 
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multicast 


17], opportunistic networks 


I, multiresolution 

00 , 


publish- 


data representations for sensor networks 

subscribe systems 0,0, HU, query processing over XML data [22|, resource discovery 
23, [0 0, 0, 0], resource management in cloud computing and distributed systems 
27 , 0 , social networks 0 ], community detection [30]], etc. 

Moreover, examples of real systems exist that employ gossip strategies. It is well 
known that Amazon S3 uses a gossip protocol to quickly spread server state informa¬ 
tion throughout the system [31] . Not only, Amazon’s Dynamo storage system employs 
a gossip based distributed failure detection and membership protocol 122]. The Face- 
book team developed Cassandra [33], a distributed storage system employing a gossip 
strategy called Scuttlebutt for membership management and for disseminating system 
control state messages 


34}. This is an anti-entropy gossip based mechanism, exploited 


to guarantee an efficient utilization of CPUs and communication channels. Tribler is an 
anonymous open source P2P Bit Torrent client, that adds keyword search ability to the 
Bit Torrent file download protocol using gossip. 

These solutions are employed into cloud or within the internal mechanisms of dis¬ 
tributed systems, with a limited amount of nodes involved. Instead, we are interested 
here in large scale systems composed of thousands of nodes. In this sense, many studies 
have been presented on epidemic algorithms to disseminate contents in communication 
networks 0, 0, 0, 0, .3,0- Due to the need to scale up to very large numbers of nodes, 
these studies were mainly theoretical, modeling gossip as a particular instance of general 
percolation problems and epidemic spread of viruses. Indeed, gossiping a message means 
that a node sends it, with a certain probability function, to nodes it is connected to 
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(i.e. its neighbors). From a modeling point of view, this is equivalent to imagine a node 
that passes a virus (with a given probability) to another node it interacts with. While 
elegant these mathematical approaches do not take into consideration several practical 
issues that should be considered when building a distributed system. For instance, an 
assumption can be that nodes have a full knowledge of the interaction history, or con¬ 
versely, that they are completely unaware of previous nodes communications. In this 
sense, a gossip strategy can be represented as a Susceptible, Infective, Recovered (SIR) 
scheme [36|, 137]] , according to which a (susceptible) node n becomes infected upon re¬ 
ception of a message and spreads it to its neighbors. Hereinafter, n will not relay the 
message anymore (in case some neighbor would resend the message to it), i.e. n is in a 
recovered state. On the other hand, the Susceptible, Infective, Susceptible (SIS) scheme 
allows to represent the situation when a node n retransmits always a received message, 
even if it has done it already |36l. 137]. In fact, when n receives a message (i.e. n becomes 
infected) it sends the message to its neighbors (via gossip) and then it returns to the 
susceptible state. In other words, n does not maintain any log of messages it processed 
already. 

As a matter of facts, while it is impractical to assume that nodes can hold forever 
information about relayed messages, it is a common practice to employ a caching system, 
that stores a limited amount of managed messages (e.g. newer ones). Indeed, in many 
scenarios it is reasonable to assume the need to spread a given item for a limited amount of 
time. When a node receives a message, if it has the message id in its cache already, this is 
means that the message has been already processed. Thus, the message can be discarded. 
In some sense, this resembles the Susceptible, Infective, Recovered, Susceptible (SIRS) 
model, which is an extension of the SIR model where a recovered node comes back to 
a susceptible state after a while (this resembles a node which removes a message entry 
from its cache). We will show that the caching policy and the cache size are two key 
factors for the gossip performance. 

The use of theoretical, simplified mathematical models is also due to the difficulty to 
resort to viable and effective simulators, able to run a large amount of node instances 
in a simulated network 0 , 00 . Today, the use of parallel and distributed simulation, 
coupled with the use of actual multi/many-core processors, makes this possible. Some 
previous works have demonstrated via simulation that gossip str ateg ies can be proficiently 


employed to disseminate data in P2P overlay networks In particular, Q 

presents a study of a gossip-based protocol for computing aggregate values over network 
components in a fully decentralized fashion. Simulation results were obtained using a 
discrete time stepped simulator. In [1], the authors of this paper made a study on the 
performance of gossip dissemination protocols over scale-free networks. While the main 
focus of that work was on tools to simulate scale-free networks, the “overhead” metric 
was used to assess the performance of a gossip dissemination scheme. A discussion on this 
metric is reported in Section [3.31 of this paper. In this work, we extend such results using 
a larger set of simulation algorithms, different network topologies, providing a theoretical 
model for the considered schemes, using a wider simulation assessment, and by resorting 
to an improved parallel and distributed simulation tool [39}. 
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3. Background 

In this section, we provide some background on main issues concerned with gossip in 
unstructured networks. 

3.1. Network Topologies 

In this work, we deal with unstructured networks. In an unstructured network, links 
among nodes are established arbitrarily. Nodes locally manage their connections with 
others, and these links do not depend on the contents being disseminated 01 . The spe¬ 
cific attachment protocol, that determines how nodes connect each other, allows building 
specific general topologies. These solutions are particularly simple to build and manage, 
with little maintenance costs, yet at the price of a non-optimal organization of the overlay. 

Different network topologies are considered in this paper. In the following, we describe 
the general characteristics of such topologies, together with the methods employed to 
build them. 


3.1.1. Random Graph Networks 

A random graph corresponds to a general network where links among nodes are 
randomly generated. In this case, the Erdos-Renyi (ER) model is employed to build 
such random graphs [41]. According to it, a graph is constructed by connecting nodes 
randomly. Each edge is included in the graph with a probability p , independently from 
every other edge. Thus, p determines how much the graph is connected. 


3.1.2. Scale-Free Networks 

A scale-free network possesses the distinctive feature of having nodes with a degree 
distribution that can be well approximated by a power law function. Hence, the majority 
of nodes have a relatively low number of neighbors, while a non-negligible percentage of 
nodes (“hubs”) exists with higher degrees |l|. The presence of hubs has an important 
impact on the connectivity of the net. In fact, the peculiarity of these networks is that 
they possess a very small diameter, thus allowing to propagate information in a low 
number of hops. 

To build such a kind of networks, we employed the Barabasi-Albert (BA) model j42|. 
The BA model is based on a preferential attachment that shows the well known rich-get- 
richer effect. In substance, upon arrival of a novel node in the network, it creates novel 
links with other nodes. In this case, a link is most likely to attach to nodes with higher 
degrees. The network begins with an initial network of m nodes. Then, other nodes are 
added to the network one at a time. Each new node is connected to m existing nodes 
with a probability that is proportional to the number of links that the existing nodes 
already have. 


3.1.3. Watts-Strogatz Small World Networks 

The Watts and Strogatz (WS) model is a random graph generation model that pro¬ 
duces graphs with small-world properties 0]. It does so by interpolating between an 
ER random graph and a regular ring lattice. An initial d-dimensional lattice structure 
is used to generate a WS model. Each node in the network is initially linked to its 2 d 
closest neighbors (for each dimension, a node has 2 neighbors i.e. its predecessor and its 
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successor along that dimension). Then, a p parameter is employed as the rewiring prob¬ 
ability. Each edge has a probability p that it will be rewired to the graph as a random 
edge. 


3.1.J).. k-Regular Networks 

k-regular networks are those where all nodes start with the same degree k. These 
networks are quite common in several (P2P) systems, where the software running on peers 
is configured to have a given number of links in the overlay. This is usually accomplished 
for load balancing purposes [44 j. 


3.2. Implementing a Dissemination Protocol 

A dissemination protocol over unstructured networks implies that messages are trans¬ 
mitted through links among nodes in the overlay. Thus, each node receiving a message 
analyzes it (passing this message to the application module, if necessary) and relays 
the message to its neighbors, via a dissemination algorithm. Since networks are usually 
graphs, it is possible that a node receives the same message multiple times. To avoid 
that such a message is relayed forever and to avoid that the communication among nodes 
is congested by redundant transmissions, two practical implementation expedients are 
employed, i.e. i) caching identifiers of already processed messages and ii) assigning a 
deadline to the life of a message in the network. 


3.2.1. Caching 

Caching is a common practice in all computing related problems. In this context, 
caching is exploited by nodes to maintain the last identifiers of messages already pro¬ 
cessed. This way, if a node receives a message whose identifier is in the node’s cache, 
such a message can be dropped to avoid redundant message processing. A key factor is 
the cache size. In fact, the higher the cache size the easier to avoid redundant retrans¬ 
missions, but the larger the memory requirements. On the other hand, when dealing 
with large scale networks, with all nodes that concurrently generate novel messages that 
need to be disseminated, it is likely that nodes are required to handle a high amount of 
messages in a short time interval, thus overwhelming caches. We study the impact of the 
cache size on the performance of the considered dissemination protocols. 


3.2.2. Time-To-Live 

Messages are associated to a Time-To-Live (TTL) value. The TTL avoids that mes¬ 
sages are forwarded forever in the net. Each time a message is relayed from a node, 
its TTL is decreased. When this value becomes 0, the receiving node does not relay 
that message to its neighbors. The tuning of the TTL is an important issue for the 
performance of the dissemination protocol. It should be sufficiently large to guarantee 
that the message can be spread through the whole network. However, usually this is 
an aspect which is not considered in mathematical works. Finding the “optimal” value 
for the TTL is very important and very different values are employed in different works. 
For instance, in [li, [Td|, 45, 46[ 47] TTL values range from 6 up to 100. When a flooding 
protocol is employed, then the TTL can be set equal to the network diameter. (Then 


one needs an estimation of this value [24].) However, when we consider the diameter of a 


network, we are focusing on the shortest paths among nodes, and in general this does not 
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mean that the average path length among nodes scales with the diameter value. When 
the dissemination protocol is based on gossip, it is not guaranteed that a message from 
a source will reach a given node through their shortest path. In this work, we analyze 
the impact of the TTL on the network coverage. 

3.3. Message Dissemination Evaluation Metrics 

Once a dissemination protocol is employed to spread information, it is important to 
define the metrics to evaluate it. The first considered measure is a common one, and it 
is strictly related to the ability of a communication protocol to disseminate a message 
to a “sufficiently large” subset of the population. This measure comes from percolation 
theory, which describes the behavior of connected clusters in graphs and studies the 
formation of long-range connectivity. A dissemination protocol works well if a message 
reaches a giant component of the order of the network size. If we imagine to spread a 
message into an infinite network, an infinite amount of nodes should receive the message. 
As described with more details in the next sections, the considered protocols refer to a 
dissemination probability value that allows deciding if a node sends a message to other 
nodes, or not. Based on the overlay topology (i.e. the nodes’ degree distribution), a 
threshold value exists for this dissemination probability that corresponds to a phase 
transition. Below the threshold, a giant component does not exist (thus, typically a 
message reaches only a local neighborhood of the node that originated the message), 
while above it there exists a giant component. In this paper, we will show that this 
phase transition point can be measured using a theoretical model that is based on the 
considered gossip protocol and the degree distribution of the network overlay (while other 
considered metrics relate to more technical aspects; hence, they can be measured through 
simulation). However, this threshold value should be considered as a sort of lower bound 
for ensuring data dissemination. In fact, the mentioned implementation parameters, 
such as nodes’ cache size and TTL of messages, that are employed to viably deploy 
gossip algorithms in unstructured P2P overlays, can strongly influence the performance 
of a dissemination protocol. We will study their impact through other metrics, measured 
via simulation. 

A desirable property of a dissemination protocol is the extension of the previous 
one, and is that of being able to reach all nodes, and this should happen as quickly as 
possible. Thus, we use a measure called coverage, which denotes the fraction of nodes 
which actually received the messages. Ideally, we wish to obtain 100% coverage, meaning 
that all nodes received all the generated messages. It is difficult to estimate such a value 
theoretically, especially when TTL and caching techniques are considered. We measure 
this measure through simulation. 

The third considered measure is called delay, and represents the average number 
of hops that a message traverses before reaching a node (lower is better). The delay 
is computed as follows: when a message is received by a node for the first time, that 
node records the number of hops the message traversed from its generation. The delay 
is computed as the number of hops, averaged over all nodes which received the message, 
and over all messages sent during a simulation run. 

It is also important to identify appropriate cost metrics, so that all dissemination 
protocols can be compared in the same conditions, in this sense, a useful measure is the 
overhead ratio p which is measured as follows (Tpi] : 


Delivered messages 
^ Lower bound 

where “delivered messages” is the total number of messages that are delivered by a specific 
dissemination protocol and the “lower bound” is the minimum number of messages (in 
each graph) that are necessary to obtain a complete coverage. Thus, the lower bound 
represents the number of messages sent by a broadcast protocol which deliver events along 
the edges of a spanning tree, and never sends duplicates. The lower bound depends on 
the graph and is independent from the dissemination protocol to be used. For example, 
in a graph of n nodes and in which m different events are generated, the lower bound 
to the number of delivered messages is Ll(nm). Each newly generated message has to 
traverse at least n — 1 links to eventually reach all nodes in the graph. Observe that n — 1 
is precisely the number of edges on any spanning tree on a graph with n vertices. 


4. Dissemination Protocols 

In this section, we describe the considered dissemination protocols; they are basically 
all push gossip-based approaches. According to our model, all nodes are able to generate 
a new message to be disseminated in the network. When the generation procedure 
is invoked at a given node, a single message is created with a certain probability, as 
described in Algorithm [T| The generation of a message simulates the occurrence of a 
new event produced at a given node that must be propagated. If the message is created, 
then it is sent through the net, using a DISSEMINATE() procedure (line 6 of the algorithm). 
The message is also inserted in a cache (line 5). 


Algorithm 1 Generation of a Message 

1: function GENERATE0 
2 : t <r- GENERATIONTHRESHOLD0 
3: if RANDOM0 < t then 
4: msg createMessage() 

5: CACHE (msg) 

6: disseminate( msg) 

7: end if 


Algorithm 2 Reception of a Message 

1: function RECEiVE(ms<?) 

2: if (notCached(ttis(/) A msg.ttl > 0) then 
3: CACHE(rns<?) 

4: msg.ttl <r- msg.ttl — 1 

5: DISSEMINATE(msg) 

6: end if 


Upon reception of a given message (see Algorithm [2]), the receiving node forwards the 
message to its neighbors by calling the DISSEMINATE0 function (line 5 in the algorithm). 
This is accomplished only if the message is not already in the node’s cache. In fact, if 
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the message is in cache, it has already been disseminated; hence, the node has nothing 
to do with the message msg (line 2). Conversely, msg is transmitted and cached (line 3 
of Algorithm [ 2 ]) . Needless to say, due to the possible memory constraints of a node, the 
cache is limited in size (cache.size). 

4-1. Dissemination #1: Gossip with Fixed Probability 

According to the first protocol, Algorithm [3] the node (say reQ randomly selects those 
edges through which the message msg must be propagated [lj, [48( . Specifically, all m ’s 
neighbors (i.e. IIQ are considered and a threshold value 7 < 1 is maintained, which 
determines the probability that msg is gossiped to the neighbor (when 7 = 1 we obtain 
a flooding algorithm). At each step the message is propagated from m to 7 |IIi| other 
nodes. Thus, the higher the node degree, the higher its workload. 


Algorithm 3 Dissemination: Gossip with Fixed Prob. of Dissemination (at nQ 

1 : function initializationQ 
2: 7 •<— CHOOSEPROBABILITY() 

3: 

4: function DISSEMINATE(ms<?) 

5: for all 7ij £ FQ do 
6 : if randomQ < 7 then 

7: SEND(ms< 7 ,? 1 j) 

8: end if 

9: end for 


4-2. Dissemination # 2: Probabilistic Broadcast 

The second distribution protocol we consider is a probabilistic broadcast scheme 
(see in Algorithm U). Once the disseminate() procedure is called, if the message has 
been locally generated at the node and msg still needs to be spread to the network (we 
assume this check is performed in FIRStTransmission(), line 5), msg is sent to all node’s 
neighbors (lines 6 - 8 ). Conversely, if msg has been received from someone else, the node 
decides with a certain probability /3 (defined at the beginning of the protocol) to forward 
msg (line 5). In the positive case, the message is sent to all node’s neighbors. 


Algorithm 4 Probabilistic Broadcast 

1: function INITIALIZATIONQ 

2: P £- PROBABILITYBROADCAST() 

3: 

4: function DISSEMINATE(mS(?) 

5: if (random() < /3 V firstTransmission()) then 
6: for all nj £ 11^ do 

7: SEND(msg,nj) 

8: end for 

9: end if 
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4-3. Dissemination #5: Degree Dependent Gossip 

According to this scheme, a node decides to relay the message, based on some specific 
features of its neighbors. Thus, a gossip probability is used that is not the same for all 
nodes, but it depends on the degree of nodes. If a node m has degree i, it will receive a 
message from a neighbor with a probability 7 (*), which is a function of the degree. The 
algorithm related to this kind of schemes can be obtained by taking Algorithm [3] and 
substituting 7 with a function 7 (*), if i is the degree of the considered node (thus, it is not 
reported here). The rationale is that nodes with a low degree might “compensate” their 
little amount of links with a higher reception probability (i.e., neighbors increase their 
dissemination probability), whereas nodes with high degree have a higher probability to 
receive a message from one of their neighbors, thus reception probability can be safely 
lowered. This last countermeasure is taken to avoid a flood of redundant messages. 

There is a vast set of possible functions that might be employed in the algorithm. In 
this work, we experiment with two different 7 (*) functions, i.e. 


7i(*) 


1 * = 1,2 


and 

. , f 1 } .. *>max( 2 ,e/a) 

72 * = ln(m) , V ’ ' ’ 

\ 1 otherwise 

where a is a parameter. The rationale behind these functions is that if a node has 
a single connection or just two connections (it might be a node in a chain), then it 
floods the message, since stopping the gossip could stop the message dissemination into 
a network subset. Moreover, as concerns 72 ( 2 ), we avoid that the inverse logarithmic 
function returns a value not comprised between 0 and 1. In the performance evaluation 
(Section^, these protocols will be referred to as Degree Dependent Function 1 (DDF1) 
and Degree Dependent Function 2 (DDF2), respectively. Many other functions can be 
tested on the different network topologies but this is out of the scope of this paper, in 
which we prefer to investigate the general approach instead of studying in deep a specific 
aspect. 

To obtain information on the degree of neighbors, we assume that nodes exchange 
degree information. In particular, degree information is piggybacked within messages 
relayed to neighbors when a node n relays a message. This allows nodes exchanging this 
information without the need for further control messages. This means that, under the 
communication point of view, the overhead introduced by this mechanism is of a few 
bytes for each exchanged message. Furthermore, appropriate techniques can be used to 
further reduce this overhead (e.g. update the node degree only when it changes). In 
others words, this means that the overhead introduced by this procedure is in most cases 
negligible. In the early steps of simulation, neighborhood information is missing, thus 
nodes are forced to broadcast each message to all neighbors. 


5. Modeling Dissemination Protocols 


In this section, we present an analytical model for dissemination strategies, 
model follows a general framework widely exploited in complex networks theory [24 . 


The 

il- 
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It shows that dissemination strategies can disseminate information data over a network, 
and that a proper tuning of the protocol parameters, based on the underlying network 
topology, enables reaching a giant component of the network (i.e., messages percolate 
through the overlay network). As mentioned in Section[3l such a model allows to evaluate 
if a threshold value exists for the parameters, given a specific dissemination protocol and 
the topology of the underlying overlay network over which the protocol is run. 

In line with typical theoretical models, some simplifications are made on technical 
details and parameters. Thus, as previously described theoretical results should be con¬ 
sidered as qualitative outcomes. In any case, trends are confirmed via simulation, where 
we include the mentioned technical details not considered in the theoretical model. 


5.1. System Model 

We consider push-based communication schemes, in accordance with schemes de¬ 
scribed in Section a 0, M ED- From a modeling point of view, push communication 
protocols resemble the spread of an epidemic in a contact network. Hence, they can be 
modeled using the frameworks typical of complex network theory 0, 12 H . 

We are interested here in assessing if it is possible to have a significant coverage of the 
network by tuning properly the dissemination protocol parameters. Since we are dealing 
with very large networks, an approach which is typical in complex network theory is to 
examine infinite networks rather than just large ones. This assumption on the network 
size does not impact the behavior and the modeling of nodes, since peers know only their 
neighbors and manage contents based on this local knowledge. 

Following the presented approach, we assume that links among nodes are randomly 
generated, based on a given node degree distribution. A consequence of the random 
nature of the attachment process is that, regardless of the node degree distribution, the 
probability that one of the second neighbors (i.e. nodes at two hops from the considered 
node) is also a first neighbor of the same node, goes as iV _1 , being N the number of 
nodes in the overlay [If} . Hence, this situation can be ignored since the number of nodes 
is high. 


5.2. General Metrics 

We denote with pi the probability that a peer n has degree equal to i, while qi is the 


37]. Probabilities pi and qi represent two similar 


excess degree distribution , qi = ^ + ^ 1+1 
concepts i.e. the number of contacts of a considered peer (its degree), and the number of 
contacts obtained following a link (its excess degree), respectively. In the following, we 
introduce measures obtained by considering the degree pi of a node, and considering the 
excess degree qi of a link. In this last case, with a slight abuse of notation we denote all 
the probabilities/functions related to the excess degree with the same letter used for the 
degree, with an arrow on top of it, just to recall that the quantity refers to a link. 

With (jp) we denote the average degree, which depends on the degree distribution of 
the overlay. Then, (q) is the mean value of the excess degree, that is [37] 


(q) = i( k = 


(j> 2 ) - (P ) 

(p) 


(1) 


Given a peer n in charge of relaying a message, the probability that n forwards it to i 
of its neighbors is denoted with /j. This measure depends on the particular probabilistic 
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communication protocol in use. The probability that, following a link we arrive to a node 
that forwards the query to i other nodes is denoted with /*. 

To proceed with the reasoning, we need to introduce the generating functions for all 
these measures. Hence, G is the generating function for p.j coefficients, G(x) = JT PiX 1 , 7* 
is the generating function for q t coefficients, Cr ( x ) = JT QiX l , F is the generating function 
for fi coefficients, F(x) = J2i fi x% > and is the generating function for coefficients, 
~j7{x) = JT fiX 1 . An interesting observation is that once one has characterized these 
fi , fi values and the related generating functions F, i^, it is possible to determine the 
average amount of receivers. This is obtained through a reasoning which is inspired from 

Hi- 

Let denote with r,; the probability that i peers receive a message, starting from a 
given node. Similarly, denote with the probability that i peers receive the message, 


starting from a link. In general, can be defined using the following recurrence, 


= 0 , 

?<« = £?, E 

j> 0 ai-\-a2+-"+a>j=i 


ai ~?a? ■ ■ - ^a 


This equation can be explained as follows. It measures the probability that following a 
link we disseminate the message to i + 1 peers. (The case "d’o is impossible, since at the 
end of a link there must be a node.) In general, one peer is that reached at the end of 
the link itself. Then, we consider the probability that the peer has other j links (varying 
the value of j). Each link k allows to disseminate the message to a*, peers, and the sum 
of all these reached peers equals to i. 

Similarly, we can calculate r,; as follows 


ro = 0, 


n +1 


^ 

j> 0 a 1 +a 2 + ...+aj=i 



In this case, we start from the peer itself, considering it has a degree equal to j; and as 
before, from its j links we can reach i other peers, globally. 

The use of generating functions may be of help to handle these two equations [h3 | . 
In fact, if we consider the generating functions for and r,, 

R( x ) = £ r i^\ ^( x ) = £^ iX z (2) 


then, after some manipulation we arrive to the following result 

^( a; )= a: £7 'j[^(x)} 3 =x'f : (^(x)) (3) 

3> 0 

and, similarly, 

R{x ) = x^2 fj(^(x)y = xFC&{x)). (4) 

j> o 
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From the generating functions, we might recover the elements r,, r * composing them. 
Unfortunately, equations ©. © may be difficult to solve, depending on the degree 
probability distribution pi which controls the whole introduced measures. But actually, 
we are not interested that much in the single values of r,, ~r^. In fact, it is easier and more 
useful to measure the average number (r) of peers that receive a given message through 
the dissemination protocol. To this aim, we can employ the typical formula for generating 
functions (r) = i?'(l) [52|. In fact, taking the first equation of ©, differentiating and 
evaluating the result for x = 1, and since rp = 0, we have 


R'{x) 


= 2* r i> 


which is the mean value related to the distribution of ri coefficients. Coefficients of 
the introduced generating functions are probabilities, and thus F(l) = Ylifi = 1 • and 
similarly i^(l) = 1, R(l) = 1, f?(l) = 1. Hence, taking © and differentiating 

(r) = R'(l)=[F(t(x))+xF'(t(x))t'(x)\ x=1 

= 1 + F'{l)t\l). 


Similarly, from ©, 


Thus, 


^'( 1 ) 


1 

1 -^( 1 )' 


This last equation allows to find the final formula for (r), 


(r) 


1 + 


F'C I) 

1 -**'(!)' 


(5) 


Equation © has a divergence when F'( 1) = 1. This implies that in an infinite 
network, the messages are spread through the net, reaching an infinite amount of nodes. 
In the next subsection, we identify the values of the i coefficients of the generating 
function, that depend strictly on the topology of the underlying network overlay and on 
the particular probabilistic communication protocol in use on top of it. 


5.3. Phase transitions for dissemination strategies 

We consider now the protocols described in the previous section, and we identify 
conditions under which we have a phase transition for message dissemination throughout 
the network. Above the phase transition a vast majority of nodes in the network will 
receive each message, i.e. a high coverage can be obtained. 
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5.3.1. Fixed probability 

This scheme is based on a gossip probability 7 , which is independent from any feature 
of connected nodes. Based on this scheme, the probability fi that a node n decide to 
relay a message to i of its neighbors is 

fi = 

j>i V ' 

This equation accounts for all probabilities of n having a degree higher than i, which de¬ 
cides to independently relay the message to i nodes (with probability 7 ), chosen among its 
neighboring set; moreover, other neighbors do not receive the message (with probability 
1 - 7 ). 

A similar reasoning can be made to measure the probability that, following a link we 
arrive to a node that forwards the query to i other nodes. This probability is readily 
obtained by substituting, in the equation above, pj with Qj, i.e. "/d = 7 * G)( 1- 

If we consider the generating function F of the fi coefficients, we have 


F ( x ) = f iX% = p l ('’■) (! “ l) 3 1 

i i j>i w 


+1 - i )° 

3 


G( 72 ; +1 — 7 ) 


Now, it is also possible to evaluate the average of the values fi, by calculating the 
derivative of F measured at x = 1, since F'(l) = ifi [ 52 }. We have 


F'(x) 


dG, , , 

, = + 1-7) 

x—1 CLX 


x=l 


= lG\l) 


lip), 


where (p) = G'(l) is the mean node degree 24, 43. From a similar reasoning, ~P'(x) = 

7^(1) = 7(g). 

In this case, the condition F'(l) = 1 for having that the message percolates through 
the network is satisfied when 

7 = 1 / (q) 


i.e., the gossip probability is equal to the inverse of the mean value of the excess degree. 
Actually, this is a result which is well known, since this is the critical transmission of 
the spread of an epidemic in a network [Eij]. This formula allows measuring the phase 
transition, for the fixed probability gossip scheme, in whatever topology of interest; one 
just needs a measure (or numerical estimation) of the average excess degree. 

Now, (q) has been measured for certain network topologies 0. We can thus easily 
provide the phase transition thresholds for entire classes of networks. For instance, in 
random graphs with a Poisson degree distribution with mean (k), the excess degree 
generating function is ~$(x) = Hence, (g) rg = {x)\ x =\ = (fc)e^ fe ^ x_1 ^ 1^=1 = 

(k). An thus, we have a phase transition when 7 rg = 1 /(k) (see Figure [Tall . That is, we 
have a high coverage in a random graph when it is quite likely that each node will relay 
a message to at least one among its ((k), on average) neighbors. As mentioned, this 
formula does not account how fast the message would percolate, and does not consider 
the fact that a message might have a TTL associated, that can stop the message spread. 
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Indeed, if one sets the TTL equal to (an estimation of) the network diameter, a message 
can reach an amount of nodes that is of the order of the network size. However, in 
general the threshold value, identified by this theoretical model, is a sort of lower bound 
for having a data dissemination. 

Similarly, in a /c-regular graph, where nodes have all the same degree k (pk = l,pi = 
0 (i ^ k)), the average excess degree is (q)k-ie S = k — 1 (see eq. [U (see Figure ITbl) . This 
is evident, since in such a network an edge will lead to a node that (by construction) has 
k — 1 other edges. An thus, we have a phase transition when 7 fc_ reg = t/ttyt- 


(a) Random graph 


<k> 

(b) k-regular graph 


Figure 1: Fixed probability gossip (and probabilistic broadcast) - phase transitions of 
the 7 (and /3) parameter. 


5.3.2. Probabilistic broadcast 

In this case, the node decides whether to forward the received message with a certain 
probability fd. However, if the message has been locally generated at the node, it is sent 
to all node’s neighbors. If the message is forwarded, it is always sent to all neighbors. 
Since we are considering a large scale network, we can neglect the first relay operation 
that creates the message and sends it to all its neighbors. Then, for all other nodes, the 
probability /, that a node n relays the message to i neighbors is /* = fdpt+\. In fact, 
with probability /3, node n performs a broadcast to all its neighbors (apart from the node 
from which it has received the message); hence, i nodes will receive such message when 
n has (i + 1) neighbors (with probability Pi+i). 

Similarly, i = fdqi , i.e., we select an edge and we arrive to a node that has an excess 
degree Thus, 

(x) = ^ fdqiX 1 = j3(s(x). 

i 

Then, i^'(l) = /3C^'(1) = fd(q). Surprisingly ( or probably not), we obtain a formula 
which is identical to that obtained for the fixed probability. In the former case, we used 
a gossip probability 7 , here we have a broadcast probability fd, but the average amount 
of receivers depends on the specific value of a probabilistic parameter and on the average 
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excess degree, which depends on the topology of the overlay. Summing up, according to 
the mathematical machinery utilized in this model, the two approaches behave in the 
same way. 

5.3.3. Degree Dependent Gossip 

In this case, the gossip probability depends on the degree of the nodes. Hence, if 
a node m has degree i, it will receive a message from a neighbor with a probability 
7 (*), which is a function of the degree. In order to determine if n relays the message 
to a neighbor m, we should sum all the possible cases that m has a certain excess 
degree, considering the probability to relay the message to a node with such a degree, 
i.e. 0 = Ylj < 7 j 7 (.'/)• With this in view, the probability /) to forward a message to i 
neighbors becomes 

fi = Q*J2pk( k )(i-Q) k -\ 

k>i 7' 

which is obtained by considering all the possible cases of n, having a degree higher than 
*, which forwards the query to i neighbors. Moreover, n does not gossip the query to 
its remaining k — i neighbors. Following the same reasoning utilized for other gossip 
strategies, 

t = e i ^2q k ( k )(i-Q) k - i . 

k>i \ / 

Thus, its generating function is 

f(x) = = £ evj> (%-©)*-* 

k>i w 

= ^%W%V(l-0) fe - 1 = £g fc (0x + l-0) fe 

k i —0 ' ' k 

= (0a; + 1 — 0). 

Thus, i^'(l) = 0 tf'(l) = 0(g), where in this case, 0 is not a simple parameter, but 
a function depending on the degree of nodes. 

As an example, Figure [ 2 ] shows when a phase transition occurs if we consider a 
degree dependent gossip exploiting function 71 (i) of Section f4.31 over random graphs. In 
particular, we report on the x-axis the average degree, while on the y-axis reports the 
a value employed on the formula of the gossip strategy. For that value of a, a related 
degree dependent gossip function is obtained. 

6. Simulation Testbed 

This section presents the details of the simulator we employed to evaluate the perfor¬ 
mance metrics and assess the dissemination protocols over different network topologies. 

LUNES (Large Unstructured NEtwork Simulator) is an easy-to-use tool for the sim¬ 
ulation of complex protocols on top of large graphs of whatever topology [39j. It is 
modular and separates into different software components the tasks of: i) network cre¬ 
ation, ii) implementation of the protocols and iii) analysis of the results. The use of a 
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Figure 2: Degree dependent gossip, function 1 phase transition with random graph 
networks of the a parameter. 

modular approach has the advantage of permitting the (re-)use and integration of ex¬ 
isting software tools, and facilitates the update and extensibility of the tool. The flow 
of data processing is linear, i.e. a network is created by the network creation topology 
module; then a communication protocol is executed on top of such a network by the 
protocol simulator; its results are analyzed by the trace analysis module. It is worth 
mentioning that all such tools have been designed and implemented to work in parallel 
and therefore they are able to exploit all the computational resources provided by paral¬ 
lel (multi-processor or multi-core) or distributed (e.g. clusters of PCs) architectures. In 
other words, while, for instance, a network (generated by the network creation topology 
module) is used by the protocol simulator, the network creation topology module may 
be active for the generation of another network. Similarly, while the protocol simulator 
is running, its outcomes from previous executions can be analyzed by the trace analysis 
module. Outcomes from a given module are exploited by the other one via simple tem¬ 
plate files (such as the graphviz dot language). These modules are described in isolation 
in the rest of the section. 

Many other software tools have been used to investigate complex networks, the most 
popular of them is PeerSim [54[. While PeerSim has demonstrated a good scalability 
and a comparison between LUNES and PeerSim is out of the scope of this paper, it is 
worth noting that the main goal of LUNES is to simulate environments that are much 
more data intensive with respect to the models that have been studied up to now (e.g. for 
the total number of messages that must be delivered in the network). In other words, 
we aim to deal with simulation models that can not be addressed using a centralized 
(Java-based) approach such as in PeerSim. All cases in which the usage of multi-core 
CPUs, clusters of PCs or High Performance Computing architectures are necessary. 

6.1. Network Topology Creation 

LUNES is able to import the graph topologies generated by other tools. In the current 
version of LUNES, we employ iqraph , a well-known tool for creating and manipulating 
undirected and directed graphs [55[ . It includes algorithms for network analysis methods 
and graph theory and allows handling graphs with millions of vertices and edges. The 
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graphs generated by igraph (or other tools) can be directly used for protocol simulation or 
much more often are stored in “corpuses”, that are collections of homogeneous graphs. 
Each corpus can be seen as a testbed environment in which the simulator compares 
the behavior and outcomes of protocols under exactly the same conditions. For example, 
given a specific dissemination protocol, different execution parameters (e.g. dissemination 
probability) can be tested to find what is the best configuration with respect to the 
desired dissemination properties. For obtaining results that are statistically correct, the 
evaluation of each metric (i.e. coverage, delay, overhead ratio) requires the execution of 
multiple independent runs. This means that, in LUNES, each graph that is part of a 
corpus is the configuration for an independent run. Under the computation complexity 
viewpoint, this means that the size of the problem is enormously increased with respect 
to a single run that tests a specific configuration of a given dissemination protocol. 

It is worth pointing out that the overlay creation and management does not consider 
issues concerned with the underlying physical network and proximity of nodes. This is a 
common practice du ring the evaluation of a P2P protocol over an unstructured overlay 
Q, 0, 0, a H 0, 221, 10 56:]. Indeed, adding variables related to physical networks, 
such as network proximity and variable delay in message transmission, would increase 
the complexity of the model, hence making more difficult to extract and compare some 
general results related to the performance of a dissemination protocol in an overlay. Thus, 
it is simpler and more effective counting the amount of hops, as a measure of the delay, 
rather than measuring network latencies that vary depending on the physical mapping 
of the overlay in a geographical network. 


6.2. Protocol Simulation 

In LUNES, the simulation services are demanded to the ARTIS middleware and the 
GAIA framework 57;, 58j|. This means that, in the implementation of new protocols to 


be simulated in LUNES, there is no need for dealing with low-level simulation details. 
The only Application Programming Interface (API) used in LUNES is quite high level 
and is provided by GAIA. Furthermore, for the implementation of new protocols LUNES 
already offers a set of primitives and functions that can be used and modified without the 
need of starting from scratch. For example, in the current version all the most common 
features of dissemination protocols are already implemented and adding new variants or 
more complex protocols is straightforward. 


6.3. Trace Analysis 

Under the performance and scalability viewpoint, the most demanding points are the 
protocol simulation and the traces analysis. As to the traces analysis, it has been excluded 
from the simulation tasks and some specific software tools have been implemented. The 
simulation of a network with a few hundred nodes for the time necessary for studying 
some common properties can generate a huge amount of simulation traces that have to 
be stored, parsed and analyzed (in the order of few gigabytes and, for the performance 
evaluation shown in this paper, up to 300 million of delivered messages per run). This 
means that, very simple metrics used for performance evaluation of the simulated protocol 
can require a lot of effort. In the current version of LUNES, this task is implemented using 
a set of shell scripts and some specific tools that have been implemented in C language 
for efficiency. This mix is both quite efficient and easy to extend and personalize. We 
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have intentionally avoided to build a monolithic application to provide users with an 
easily customizable tool. 


6-4- Time Evolution 

LUNES exploits a time-stepped approach to perform simulations. A reason is that 
this choice simplifies the deployment of the simulation over parallel and distributed sim¬ 
ulation architectures. Moreover, it allows exploiting the load balancing techniques and 
simulation entities migration approaches offered by the ARTIS middleware and the GAIA 
framework. This means that, in order to simulate asynchronous scenarios using time- 
stepped simulations, a reasonable time granularity must be identified, so that the size 
of each timeslot allows properly handling successive events that occurs in time, thus 
guaranteeing a correct event ordering for subsequent events. 

As concerns P2P systems and complex networks, it is quite common to simulate them 
using a 


time-stepped approach QiiiElM EE [Hi [HU. 


7. Performance Evaluation 

In this section, the different dissemination protocols are evaluated on top of the graph 
topologies previously introduced in Sectionjd] Each different topology is studied by means 
of a corpus that is composed of 10 graphs generated using the igraph generators reported 
in Table [TJ For all the graphs in this section, each point is obtained averaging the results 
on the graphs of the specific corpus. Confidence intervals are very narrow and are left 
out from the figures for better readability. 


igraph generators 


method 

nodes 

edges per node 

edges 

Random 

igraph_erdos_renyi_game 

500 

2, 3,4 

1000, 1500, 2000 

Scale-free 

igraph_barabasi_game 

500 

2, 3,4 

1000, 1500, 2000 

Small-world 

igraph_watts_strogatz_game 

500 

2, 3,4 

997, 1494, 1990 

k-regular 

igraph_k .regular _game 

500 

4, 6,8 

1000, 1500, 2000 


Table 1: igraph generators used for building the graph corpuses, main parameters. 

Each simulation run is 1000 timesteps long, an amount of steps that is necessary to 
avoid transient effects (e.g. the cache efficiency). Each node in the network can generate 
new messages during the whole simulation lifespan except for the last TTL timesteps. If 
nodes are allowed to generate new messages in the very last timesteps then these messages 
would not have the possibility to reach their destination before the simulation ends. 
Clearly, this must be avoided because it would affect the measured coverage. The time 
between successive messages is generated according to a typical exponential distribution 
(mean value 10 timesteps in this performance evaluation). As already mentioned, each 
node implements a cache structure with the aim of reducing the number of duplicate 
messages in the network. This cache is managed using the Least Recently Used (LRU) 
replacement algorithm; the cache size is one of the parameters that will be studied in the 
following. The study of more specific (and efficient) cache replacement algorithms is left 
as future work. 
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Another main model parameter is the lifetime of each message in the network (Time- 
To-Live, TTL). We study how this parameter influences the protocols performance but, 
as a rule of thumb, when not differently stated it is equal to the 130% of the max diame¬ 
ter of all the graphs in a given corpus. It is clear that a TTL that is lower than the graph 
diameter does not permit a full coverage and conversely, a too large TTL could lead to 
some extra overhead. Given its main importance, this last point will be addressed in the 
first part of the following performance evaluation. 

To foster the reproducibility of our experiments, all the source code used in this 
performance evaluation, and the raw data obtained in the experiments execution, are 
freely available on the research group website l58i ]. 

7.1. Random Graph Networks 

Figures [31 [2 and [5] show the behavior of the Fixed Probability (FP) and Probabilistic 
Broadcast (PB) protocols on random graphs with 1000, 1500 and 2000 edges. On the 
left side of each figure, we show the coverage obtained with a given overhead ratio, 
while on the right side we show the corresponding delay. In both cases, all the results 
are obtained using an indirect method. In fact, FP and PB are executed varying the 
dissemination probability and collecting the resulting overhead ratio, coverage and delay. 
This means that for each set of figures (i.e. (a) and (b)), 1000 simulation runs have been 
executed (that is 100 different dissemination probabilities and for each of them 10 multiple 
independent runs). In this first set of experiments, the cache size has been set to 256 
positions. 

Both algorithms are able to provide full dissemination (i.e. 100% coverage) but FP 
has higher coverage for medium overhead values. Because of this, for applications that 
do not require a full coverage, FP is a better choice than PB. On the other hand, the 
downside of FP is the slightly higher delay with respect to PB. Indeed, a reduced overhead 
corresponds to a reduced amount of message copies (for the same data) being transmitted 
in the overlay and a higher delay. In fact, the higher the amount of copies of the same 
message spread through the network, the lower the delay (number of hops to reach a 
given node). 

Increasing the number of edges, the overhead required to obtain full dissemination also 
increases. This is due to the fact that the higher the number of edges in the network, the 
higher the average node degrees, and thus, the higher the amount of duplicate messages 
generated by the dissemination. Clearly, the caching strategy reduces the retransmission 
of duplicates (since it avoids retransmission of cached messages), yet without solving 
the problem entirely. However, a message can travel through different paths to reach a 
node, and increasing the amount of edges in a random graph increases proportionally 
the probability of this situation. 

In Figures [H[l] and O the (b) charts show a rapid increment of the delay (depending 
on the overhead) and a progressive decrement. In general, this trend is common to all the 
experiments we made. An explanation seems to be as follows. With low overheads, only 
few paths are enabled for sent messages, with local disseminations; this results in short 
delays for transmitted messages. Indeed, by looking at the (a) charts in these figures, 
with such overheads on average a message is able to reach less than the 50% of network 
nodes. Then, the higher the overhead the higher the probability of dissemination and 
the higher the path lengths of transmitted messages. Thus, we measure higher delays. 
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However, if we further increase the overhead, we have a tipping point above which the 
higher dissemination probability allows messages to travel through alternative paths in 
the network with the effect of lowering on average the delay path between nodes pairs. 

Figure [Ba] shows the effect of different cache sizes on the FP protocol; the testbed 
is a random graph network composed of 500 nodes and 1000 edges. It is obvious that 
the cache efficiency has a big effect on the dissemination overhead. In fact, the role 
of the cache is to “absorb” as many as possible duplicated messages. If a cache is too 
small with respect to the amount of different messages that are in the network then all 
these duplicated messages will continue to flow until their TTL becomes 0; then they are 
discarded. For example, with a cache of 16 positions the overhead to obtain full coverage 
is 34.24. Increasing the cache size has the effect to lower the overhead for the coverage 
level. Obviously, the cache efficiency has a limit. In fact, the points for cache size 256 and 
512 are overlapped. This means that this cache size, in this specific scenario, is enough to 
obtain the best possible caching effect (all the duplicated messages are absorbed by the 
cache). In the following of this section, all experiments will be executed with cache = 256 
positions. Even if only few bytes are necessary to identify each generated message in a 
dissemination, we don’t think that a feasible approach would be to store (in each node) 
all the unique messages that have reached the nodes. In our view, this approach would 
limit the scalability with respect to the number of nodes and generated messages. 

The TTL dimensioning is another key point in the dissemination protocols evaluation. 
As already mentioned, a TTL value that is lower than the graph diameter would make 
impossible to obtain full dissemination; but what is the effect of an excessively large TTL 
value? The answer it that it depends on the cache efficiency. If the cache is unable to 
absorb duplicate messages then the larger the TTL the longer the messages will stay in 
the network (i.e. increasing the overhead). Figure [6bl shows that when the cache size is 
adequate (e.g. 256 positions) the effect of TTL on the overhead is negligible. This means 
that setting the TTL to 130% of the max diameter of all the graphs in a given corpus is 
a correct assumption. 

In Figures 0 [8] and 0 the FP dissemination is compared with DDF1 and DDF2. 
In all cases, the degree dependent protocols perform better than FP. Table 0 reports 
the overhead (and delay) that is necessary to the different dissemination protocols for 
obtaining a given level coverage (100%, 99%, 90%, 75%) in a random graph with 1000 
edges. All protocols have almost the same overhead for a full dissemination. Both DDF1 
and DDF2 are better for partial coverages. 


Overhead (and delay) for a given coverage 

Algorithm 

100.0% 

99.0% 

90.0% 

75.0% 

FP 

3.00 (4.65) 

2.74 (4.86) 

1.80 (6.10) 

1.20 (7.62) 

PB 

3.00 (4.65) 

2.84 (4.76) 

2.03 (5.54) 

1.38 (6.49) 

DDF1 

2.99 (4.66) 

2.24 (5.59) 

1.48 (7.72) 

1.06 (9.11) 

DDF2 

2.99 (4.67) 

2.16 (5.88) 

1.48 (7.74) 

1.07 (8.99) 


Table 2: Random graph networks, 500 nodes, 1000 edges, max diameter=10, TTL=16, 
cache=256. 


22 











Increasing the number of edges per node, the gap between the algorithms seems to 
be reduced but DDF2 is slightly better than other protocols (Table [3| • 


Overhead (and delay) for a given coverage 

Algorithm 

100.0% 


FP 

5.00 (3.66) 

+4.38% 

PB 

5.00 (4.91) 

+4.38% 

DDF1 

5.00 (3.66) 

+4.38% 

DDF2 

4.79 (3.72) 

best 


Table 3: Random graph networks, 500 nodes, 1500 edges, max diameter=7, TTL=10, 
cache=256. 

With 2000 edges (Tabic [5]) both DDF1 and DDF2 are better than FP and PB. In this 
case, the overhead needed by DDF2 to get full coverage is 40.56% lower than FP and PB. 
As usual, the overhead reduction is obtained at the cost of a moderate increase in the 
delay (+10.59%). These results suggest that a degree dependent gossip strategy should 
be used when coverage is the main metric to pursue. In fact, given a certain overhead, 
degree dependent strategies provide higher coverages, yet at the cost of a slightly higher 
delay. 

Also in the case of DDF protocols, the results shown in the figures are obtained using 
an indirect method. In this case, it has been varied the a parameter (see Section roi) 
with the goal to obtain full coverage with the lowest possible overhead. This is done, 
using a sampling of the possible values of a and a fine-grained exploration of the values 
near the point of interest (that is where there is full coverage). Again, the number of 
required runs for each evaluation is huge and therefore a high simulator efficiency is a 
prerequisite. 


Overhead (and delay) for a given coverage 

Algorithm 

100.0% 


FP 

7.00 (3.21) 

+40.56% 

PB 

7.00 (3.21) 

+40.56% 

DDF1 

6.29 (3.30) 

+26.30% 

DDF2 

4.98 (3.55) 

best 


Table 4: Random graph networks, 500 nodes, 2000 edges, max diameter=6, TTL=8, 
cache=256. 


7.2. Scale-Free Networks 

Figures [10l QT] and m show the behavior of the FP and PB protocols on scale-free 
networks with 997, 1494 and 1990 edges. That is a graph generation with 2, 3 and 4 
edges for each node when using the Barabasi-Albert model. Also in this case, FP gets 
better results than PB for partial coverage and also for full coverage when the edges per 
node is larger than 2. 
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DDF1 performs slightly better than other dissemination algorithms for 2 edges per 
node CFigurcHdl). With 3 and 4 edges per node (Figures [14] and [15]), DDF2 is largely the 
best solutions and, in this case, the gain is in the order of 30% for a full dissemination 
(Tables [5] and Q. Actually, we noticed that when considering mid-range coverage levels, 
the gain is quite relevant (not shown in the tables for the sake of conciseness). 


Overhead (and delay) for a given coverage 

Algorithm 

100.0% 


FP 

2.99 (3.30) 

+6.40% 

PB 

2.99 (3.30) 

+6.40% 

DDF1 

2.81 (3.39) 

best 

DDF2 

2.93 (3.38) 

+4.27% 


Table 5: Scale-free networks, 500 nodes, 997 edges, max diameter=7, TTL=10, 
cache=256. 


Overhead (and delay) for a given coverage 

Algorithm 

100.0% 


FP 

4.77 (2.78) 

+31.04% 

PB 

4.87 (2.77) 

+33.79% 

DDF1 

4.73 (2.78) 

+29.94% 

DDF2 

3.64 (3.03) 

best 


Table 6: Scale-free networks, 500 nodes, 1494 edges, max diameter=5, TTL=7, 
cache=256. 


Overhead (and delay) for a given coverage 

Algorithm 

100.0% 


FP 

6.33 (2.54) 

+29.44% 

PB 

6.47 (2.53) 

+32.31% 

DDF1 

6.17 (2.54) 

+26.17% 

DDF2 

4.89 (2.61) 

best 


Table 7: Scale-free networks, 500 nodes, 1990 edges, max diameter=4, TTL=6, 
cache=256. 


7.3. Small-World Networks 

The dissemination on small-world networks, with a rewiring probability set to 0.1 
(this is a typical value to create a small-world network), further confirms that FP is 
better than PB for partial and full coverage (Figure [16]). The gap between FP and PB 
increases with a higher number of edges per node (Figures 1171 and 1181) . 
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In this specific scenario, DDF1 and DDF2 are unable to get better results for partial 
coverage with respect to FP ('Figures fTTJl l20l and l2ll) but the gain is evident for full cov¬ 
erage (Tables [SI [ill and [TUI). When we consider a partial coverage, we are dealing with 
the need to widely spread a content over a network; when the network is a small-world, 
then the topology itself provides links to reach different portions of the network. Thus, 
a simple gossip strategy is quite effective. For a full coverage, instead, a dissemination 
procedure that floods the message when a node has very just one or two links, as de¬ 
gree depending protocols do, avoids that some “peripheral” node is left out, during the 
dissemination. 


Overhead (and delay) for a given coverage 

Algorithm 

100.0% 


FP 

2.97 (6.24) 

+3.84% 

PB 

3.00 (6.20) 

+4.89% 

DDF1 

2.88 (6.38) 

+0.69% 

DDF2 

2.86 (6.58) 

best 


Table 8: Small-world networks, 500 nodes, 1000 edges, rewiring probability=0.1, max 
diameter=13, TTL=17, cache=256. 


Overhead (and delay) for a given coverage 

Algorithm 

100.0% 


FP 

4.55 (4.74) 

+10.16% 

PB 

4.75 (4.67) 

+15.01% 

DDF1 

4.20 (4.92) 

+1.69% 

DDF2 

4.13 (5.04) 

best 


Table 9: Small-world networks, 500 nodes, 1500 edges, rewiring probability=0.1, max 
diameter=9, TTL=12, cache=256. 


Overhead (and delay) for a given coverage 

Algorithm 

100.0% 


FP 

5.53 (4.17) 

+10.60% 

PB 

6.02 (4.08) 

+20.40% 

DDF1 

5.20 (4.27) 

+4.00% 

DDF2 

5.00 (4.37) 

best 


Table 10: Small-world networks, 500 nodes, 2000 edges, rewiring probability=0.1, max 
diameter=7, TTL=10, cache=256. 
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7 -4- K-regular Networks 

As expected, the very regular structure of k-regular networks has an impact on the 
performance of the dissemination algorithms. More in detail, as usual FP is slightly 
better than PB (Figures l22l l23l and PHI). 

The results of DDF1 and DDF2, as expected, are exactly the same of FP ( Tables HT1 
HU and □J and Figures HU HU and H71) . This is easy to explain given that DDF1 and 
DDF2 are adaptive variants of FP. In other words, the degree dependent algorithms are 
unable to tune the dissemination probability in a network in which every node has the 
same degree by construction. 


Overhead (and delay) for a given coverage 

Algorithm 

100.0% 


FP 

2.75 (5.27) 

best 

PB 

2.88 (5.12) 

+4.72% 

DDF1 

2.75 (5.27) 

best 

DDF2 

2.75 (5.27) 

best 


Table 11: K-regular networks, 500 nodes, 1000 edges, max diameter=8, TTL=11, 
cache=256. 


Overhead (and delay) for a given coverage 

Algorithm 

100.0% 


FP 

4.05 (4.06) 

best 

PB 

4.20 (4.01) 

+3.70% 

DDF1 

4.06 (4.06) 

+0.24% 

DDF2 

4.05 (4.06) 

best 


Table 12: K-regular graphs, 500 nodes, 1500 edges, max diameter=6, TTL=8, 
cache=256. 


Overhead (and delay) for a given coverage 

Algorithm 

100.0% 


FP 

4.99 (3.59) 

best 

PB 

5.25 (3.54) 

+5.21% 

DDF1 

5.00 (3.59) 

+0.20% 

DDF2 

4.99 (3.59) 

best 


Table 13: K-regular graphs, 500 nodes, 2000 edges, max diameter=5, TTL=7, 
cache=256. 
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llj . That is, some nodes benefit 


7.5. Free Riding 

Free riding is a common behavior in P2P systems 
from the information provided by the other nodes without offering anything in return to 
the network. It is clear that the effect of free riding on the protocols performance can 
be severe [59|. In the case of data dissemination, this means that some nodes (i.e. free 
riders) generate new messages to be delivered in the network but they refuse to forward 
the incoming messages originated by other nodes. 

The aim of this section is to investigate the effect of free riding with respect to a spe¬ 
cific metric (i.e. coverage) in presence of different network topologies and dissemination 
algorithms. For the sake of brevity, only a specific network configuration (i.e. 500 nodes, 
3 edges per node) and the best dissemination algorithms resulting from the previous 
evaluation (i.e. FP and DDF2) have been tested. 


7.5.1. Random Networks 

In Figure [25] is shown the effect of a given percentage of free riders on the coverage 
that can be obtained for a specific overhead ratio in random networks. Figure [25a] demon¬ 
strates that even a low percentage of free riders (i.e. 10%) prevents the FP dissemination 
to reach full coverage. This means that some messages are discarded by free riders and 
therefore some nodes (e.g. leaves) are unable to get such messages from alternative edges. 
Clearly, this effect on coverage is amplified by a higher percentage of free riders. More 
specifically, the coverage is reduced by [0.19%, 0.98%, 2.43%] for [10%, 20%, 30%] of 
free riders. It is worth noting that, even if the configuration with 10% of free riders is 
unable to get a full coverage, for some overhead values the coverage is marginally better 
than without free riders (see the zoom area in Figure [28a.ll . The main reason behind this 
behavior is cache efficiency. In fact, due to the presence of free riders, a lower number of 
messages is delivered and the caches implemented in each node are able to operate with 
a slightly higher efficiency. This leads to less duplicated messages and hence a higher 
coverage for a given overhead value. 

Figure I28bl shows the effect of free riding on DDF2. In this case, the coverage is 
reduced by [0.18%, 1%, 2.4%]. This means that, in terms of best coverage, the free 
riding has the same effect on both dissemination algorithms. In the case of DDF2, a 10% 
of free riders reduces the coverage for all the significant overhead values. This means 
that with DDF2 there is not the cache efficiency improvement seen with FP. This result 
is in accordance with the characteristics of the degree dependent algorithms. 


7.5.2. Scale-Free Networks 

Scale-free networks better deal with free riders than random networks (Figure [251) . In 
fact, with 10% of free riders both FP and DDF2 are still able to get full dissemination. 
More specifically, FP obtains a [0%, 0.74%, 2.24%] coverage and DDF2 gets [0%, 0.79%, 
2.37%]. It is worth noting that, also in this case, FP and DDF2 obtain results that are 
comparable. This means that, if we limit our analysis to the FP and DDF2, then the 
effect of free riding on coverage is more dependent on the network topology than on the 
dissemination algorithm. 


7.5.3. Small-World Networks 

The effect of free riding on small-world networks is twofold (Figure 1501) . In fact, a 
small amount (i.e. 10%) of free riders is able to prevent full dissemination. On the other 
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hand, for higher levels of free riding (i.e. 20% and 30%) the reduction in coverage is 
more limited than with random and scale-free topologies. Such as in previous cases, FP 
and DDF2 obtain results that are very close. That is [0.08%, 0.41%, 1.13%] for FP and 
[0.07%, 0.40%, 1.07%] for DDF2. 

7.5-4- K-regular Networks 

K-regular networks are the topology less affected by free riding (i.e. FP [0%, 0%, 
0.11%] and DDF2 [0%, 0%, 0.10%]). As shown in Figure I3T1 for up to 20% of free riders 
both the dissemination algorithms are able to get full dissemination. With 30%, the 
reduction of coverage is negligible. This good resistance to free riding is given by the 
uniform structure of k-regular networks. In fact, in the absence of hubs and leaf nodes, 
the disconnection of nodes from the network (due to free riders) is less likely. 

8. Discussion 

In this section, the main results from previous performance evaluation are discussed 
more fully. 

8.1. On the Performance of Gossip 

We already mentioned that the considered schemes allow disseminating messages in 
an unstructured overlay, whatever its topology, at the cost of some redundancy in the 
transmission of messages. Degree dependent gossip-based protocols outperform in most 
cases other standard dissemination schemes, mostly in terms of coverage. In particular, 
DDF2 performs better than DDF1, in general. However, the dissemination protocol to 
use might be based on the topology of the P2P unstructured overlay. In fact, results show 
that the degree dependent gossip protocols work well in random graphs and scale-free 
networks, where the structure of these topologies is purely based on the degrees setting, 
and the connections among nodes is arbitrary. Conversely, in regular networks there is 
no need to employ a degree dependent gossip protocol, since there are no nodes with 
degree higher than others; hence, this approach cannot provide important benefits. 

Results show that, when dealing with small-world networks, the performance im¬ 
provements of the degree-dependent gossip scheme is (present but) lower than in random 
and scale-free topologies. This is probably due by the topology of a small world net¬ 
work, that is typically formed by an amount of local links (i.e., links with nodes that are 
in turn neighbors by themselves) and by some few “long distance” links, that connect 
a node with others placed in other portions of the network (i.e., apart from that link 
that connect two nodes, say x and y, alternative paths from x to y have much higher 
distances). Gossip protocols select receivers randomly, without exploiting the presence 
of such long distance links. Indeed, in these topologies an informed approach might be 
preferred, that takes into consideration the peculiarity of links in order to fasten the 
dissemination process. 

There is a trade-off between delay and overhead (i.e. number of delivered messages 
over the minimum number of messages needed to obtain a complete coverage). Indeed, 
delay can be lowered in a gossip protocol if we increase the gossip probability or, if we are 
able to change the setting parameters of peers, by increasing the node degrees. In fact, 


28 


this would give to each peer more chances to disseminate a message. But, the higher the 
amount of relayed messages the higher the overhead. 

Finally, it is worth noticing that these results can be used to set a network in order 
to guarantee certain communication properties. Given an overlay topology, if a certain 
network coverage should be guaranteed during the dissemination of a message, then a 
corresponding overhead and delay have to be expected. 

8.2. On the Gossip Dissemination Algorithms 

In this work, the gossip dissemination algorithms that have been considered are quite 
simple and all of them are push-based. The rationale of this decision is that, in this 
paper, we aim to demonstrate that our simulation-based approach permits the analysis of 
a gossip protocol, determining when and how it is effective, given an underlying network 
topology. In our view, this is the prerequisite for the study of more complex dissemination 
strategies. In fact, the conclusions we reached in this paper can be easily extended to 
other kinds of dissemination protocols, including more complex ones such as pull or 
push-pull schemes. Thanks to the simulator structure, it is quite easy to implement 
new dissemination protocols (or to change the behavior of existing ones). This will 
allow a more comprehensive comparison of dissemination protocols when run on different 
underlying network topologies. 


8.3. On Churn 

Our analysis assumes that the time required for the dissemination of a message is of 
an order of magnitude lower than the typical time for perceiving a significant network 
alteration, in terms of topology. Hence, due to churn, neighbors of a node change in 
time, but such changes of the network do not alter significantly the topology of an 
unstructured overlay, during the execution of the gos sip procedure. This is a common 
practice in simulation of P2P systems a a Emus. 

A further motivation for such a simplification is that the probabilistic dissemination 
of a message in an unstructured network does not require (as the name says by itself) 
a reconfiguration of the structure of the overlay, if a neighbor node leaves the network. 
This means that a node in charge of relaying a message decides the set of receiving nodes 
independently, and the failure of a neighbor node has (in general) no implications on 
other neighbors. A benefit due to this assumption of considering a overlay static, during 
the dissemination of a single message, is that it allows more easily measuring metrics of 
interest such as coverage, delay and overhead ratio. 


8-4- On Free Riding 

As expected, the presence of free riders in the overlay network has an impact on the 
evaluated metrics but actually such an impact is quite limited. The previous evaluation 
shows that the network topology is much important than the dissemination algorithm 
to limit the effect of free riding on the dissemination coverage. In other words, in the 
design of P2P overlays, free riding is a parameter that must be considered due to its 
impact on performance. In practice, thanks to our simulator, it is possible to investigate 
the effect of free riding also on complex dissemination algorithms. Moreover, setting 
specific dissemination probabilities for each node, it is even possible to study the effect 
of “partial” and “transient” free riders on the dissemination metrics. 
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9. Conclusions 


In this paper, we presented a study on highly intensive data dissemination protocols 
over unstructured P2P overlays. We compared four protocols, i.e. fixed probability, 
probabilistic broadcast and two degree dependent gossip-based protocols that change 
the probability of dissemination based on the degree of peers. Our results show that 
the degree dependent gossip-based protocols outperform in most cases other standard 
dissemination schemes, mostly in terms of coverage. 

We have provided a theoretical model that allows determining the threshold values for 
having that a disseminated messages reaches a giant component of the network. More¬ 
over, thanks to the features of LUNES, a parallel and distributed simulator we built, 
we were able to study the impact of cache and TTL on data dissemination, in terms 
of coverage and delay on disseminations in which the number of delivered messages is 
in the order of millions. Outcomes confirm that these technical strategies have an im¬ 
portant influence on the performance of dissemination schemes. Moreover, dissemination 
schemes are quite effective to spread information in P2P overlay networks, whatever their 
topology. 

A main result of this study is that, given a set of nodes in a P2P overlay, and the 
need to design and create an overlay, it is possible to understand how setting the network 
in order to guarantee certain communication properties. In particular, our study shows 
that given an overlay topology, a certain overhead and delay can be expected to have a 
particular network coverage, during the dissemination of a message. Then, increasing the 
number of links decreases the average delays, at the cost of higher overheads (i.e. number 
of delivered messages over the minimum number of messages needed to obtain a complete 
coverage). Moreover, the selection of the dissemination protocol should be based on the 
topology (and vice versa). As mentioned, the degree dependent gossip approach works 
well in random graphs and scale-free networks. Indeed, the structure of these topologies 
is purely based on the degrees setting, and the connections among nodes is arbitrary. 
Conversely, as expected its improvements are negligible in regular networks, since there 
are no nodes with degree higher than others. Similarly, in order to take advantage of the 
peculiarities of a small-world network, an informed approach should be preferred, that 
for instance, takes into account the distance among nodes. In fact, our results show that 
in the case of small-world network the degree-dependent scheme improves performances, 
but less than in random and scale-free topologies. 

Finally, a preliminary evaluation of the impact of free riding on the studied metrics 
has been reported. The main outcome is that, actually, the presence of free riders has 
an impact on coverage but actually such an impact is quite limited. More specifically, 
it results that the effect of free riding (on coverage) is more dependent on the network 
topology than on the dissemination algorithm. 
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Acronyms 


LUNES Large Unstructured NEtwork Simulator 
FP Fixed Probability (dissemination protocol) 

PB Probabilistic Broadcast (dissemination protocol) 

DDF1 Degree Dependent Function 1 (dissemination protocol) 
DDF2 Degree Dependent Function 2 (dissemination protocol) 
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Dissemination protocols comparison: coverage 


Dissemination protocols comparison: delay 




Figure 3: Random graph networks, 500 nodes, 1000 edges, max diameter=10, TTL=16, 
cache=256. 
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Figure 4: Random graph networks, 500 nodes, 1500 edges, max diameter=7, TTL=10, 
cache=256. 



Figure 5: Random graph networks, 500 nodes, 2000 edges, max diameter=6, TTL=8, 
cache=256. 
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Dissemination protocols comparison: coverage Fixed Probability dissemination: coverage, different TTL values 
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Figure 6: Random graph networks, 500 nodes, 1000 edges, max diameter=10. a) 
TTL=16, b) cache=256. FP dissemination protocol. 
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Figure 7: Random graph networks, 500 nodes, 1000 edges, max diameter=10, TTL=16, 
cache=256. 
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Figure 8: Random graph networks, 500 nodes, 1500 edges, max diameter=7, TTL=10, 
cache=256. 
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Figure 9: Random graph networks, 500 nodes, 2000 edges, max diameter=6, TTL=8, 
cache=256. 
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Figure 10: Scale-free networks, 500 nodes, 997 edges, max diameter=7, TTL=10, 
cache=256. 
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Figure 11: Scale-free networks, 500 nodes, 1494 edges, max diameter=5, TTL=7, 
cache=256. 
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Figure 12: Scale-free networks, 500 nodes, 1990 edges, max diameter=4, TTL=6, 
cache=256. 
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Figure 13: Scale-free networks, 500 nodes, 997 edges, max diameter=7, TTL=10, 
cache=256. 
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Figure 14: Scale-free networks, 500 nodes, 1494 edges, max diameter=5, TTL=7, 
cache=256. 
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Figure 15: Scale-free networks 
cache=256. 
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Figure 16: Small-world networks, 500 nodes, 1000 edges, rewiring probability=0.1, max 
diameter=13, TTL=17, cache=256. 
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Figure 17: Small-world networks, 500 nodes, 1500 edges, rewiring probability=0.1, max 
diameter=9, TTL=12, cache=256. 
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Figure 18: Small-world networks, 500 nodes, 2000 edges, rewiring probability=0.1, max 
diameter=7, TTL=10, cache=256. 
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Figure 19: Small-world networks, 500 nodes, 1000 edges, rewiring probability=0.1, max 
diameter=13, TTL=17, cache=256. 
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Figure 20: Small-world networks, 500 nodes, 1500 edges, rewiring probability=0.1, max 
diameter=9, TTL=12, cache=256. 
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Figure 21: Small-world networks, 500 nodes, 2000 edges, rewiring probability=0.1, max 
diameter=7, TTL=10, cache=256. 
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Figure 22: K-regular networks, 500 nodes, 1000 edges, max diameter=8, TTL=11, 
cache=256. 
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Figure 23: K-regular graphs, 500 nodes, 1500 edges, max diameter=6, TTL=8, 
cache=256. 
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Figure 24: K-regular graphs, 
cache=256. 
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Figure 25: K-regular networks, 500 nodes, 1000 edges, max diameter=8, TTL=11, 
cache=256. 
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Figure 26: K-regular graphs, 500 nodes, 1500 edges, max diameter=6, TTL=8, 
cache=256. 
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Figure 27: K-regular graphs, 500 nodes, 
cache=256. 
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Free riders impact on FP dissemination: coverage. 


Free riders impact on DDF2 dissemination: coverage. 
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Figure 28: Random networks, 500 nodes, 1500 edges, max diameter=7, TTL=10, 
cache=256. a) Fixed Probability vs. b) DDF2. 
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Figure 29: Scale-free networks, 500 nodes, 1494 edges, 
cache=256. a) Fixed Probability vs. b) DDF2. 
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Free riders impact on FP dissemination: coverage. 


Free riders impact on DDF2 dissemination: coverage. 
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Figure 30: Small-world networks, 500 nodes, 1500 edges, rewiring probability=0.1, max 
diameter=9, TTL=12, cache=256. a) Fixed Probability vs. b) DDF2. 
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Figure 31: K-regular graphs, 500 nodes, 1500 edges, 
cache=256. a) Fixed Probability vs. b) DDF2. 
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